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In this document, we'll explore the 
PySpark Runtime Architecture. 


Submit our application into Cluster 
Using Spark Submit Command 


Spark Submit Command 


spark-submit \ 
--master yarn \ 
--deploy-mode cluster \ 
--executor-cores 4 \ 
--num-executors 2 \ 
--executor-memory 16G \ 
--driver-memory 16G \ 4 node cluster 
/path/to/PySpark.py 


G What happens when we submit our Q 
application into cluster? 
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Driver compiles code into multiple Jobs based on Action and 
Dag Scheduler create stages based on Wide Transformation 
respectively 
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Driver Request Resource Manager 


to create requirement resource for job execution 
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Driver creates and launches the Executors 


Executors perform the actual processing of the data by executing 
tasks assigned to them by the driver 
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Have you enjoyed this overview of PySpark's 
Runtime Architecture? 


Arud Seka Berne S 


Follow me for more in-depth technical content on 
big data and related topics. 


